|
mruby 4.0.0
mruby is the lightweight implementation of the Ruby language
|
This document describes mruby's compilation pipeline for developers working on the parser, code generator, or bytecode format.
Read this if you are: adding new syntax or modifying the parser, debugging codegen issues (wrong registers, missing opcodes), working with the .mrb binary format, or understanding how Ruby constructs map to bytecode.
The lexer and parser are combined in a single Lrama/Bison grammar file: mrbgems/mruby-compiler/core/parse.y.
The parser maintains extensive state in mrb_parser_state:
+/- are interpreted (sign vs operator) and whether newlines are significant.The parser produces an AST using two node types:
node_type, lineno, and filename_indexKey node types include NODE_SCOPE (new variable scope), NODE_STMTS (statement sequence), NODE_IF, NODE_WHILE, NODE_CALL (method call), NODE_DEF (method definition), NODE_CLASS, NODE_RESCUE, NODE_ENSURE, etc. See mrbgems/mruby-compiler/core/node.h for the complete list.
Local variables are tracked per-scope during parsing:
local_add(sym): register a new local variable in current scopelocal_var_p(sym): check if a symbol is a local variable (affects whether an identifier is parsed as a method call or variable reference)The code generator (mrbgems/mruby-compiler/core/codegen.c) walks the AST and emits bytecode into mrb_irep structures.
Each lexical scope (method, block, class body) has its own codegen_scope:
Scopes nest for blocks, method definitions, and class/module bodies. Each scope produces one mrb_irep.
The code generator uses a simple stack-based register allocator:
selfpush() increments sp and tracks the high-water mark in nregs. pop() decrements sp. The allocator is linear - it does not reuse temporaries within an expression.
Instructions are emitted via helper functions:
genop_0(opcode): no operandsgenop_1(opcode, a): one operand (auto-extends with OP_EXT1 if a > 255)genop_2(opcode, a, b): two operands (auto-extends with OP_EXT1/2/3 as needed)genop_3(opcode, a, b, c): three operandsgenop_W(opcode, a): 24-bit operandgenop_2S(opcode, a, b): one 8-bit + one 16-bit operandThe code generator performs limited peephole optimizations, such as removing redundant OP_MOVE instructions and combining consecutive literal loads. Optimization is disabled at jump targets and when no_optimize is set in the compilation context.
Loop constructs (while, until, for, blocks) push a loopinfo structure that tracks jump destinations:
pc0: destination for nextpc1: destination for redopc2: destination for breakLoop types (LOOP_NORMAL, LOOP_BLOCK, LOOP_FOR, LOOP_BEGIN, LOOP_RESCUE) determine how break/next/redo behave.
The compiled bytecode is stored in mrb_irep (Instruction REPresentation):
Pool entries store constants referenced by instructions:
| Type | Tag | Description |
|---|---|---|
IREP_TT_STR | 0 | Dynamic string (heap allocated) |
IREP_TT_SSTR | 2 | Static string (read-only) |
IREP_TT_INT32 | 1 | 32-bit integer |
IREP_TT_INT64 | 3 | 64-bit integer |
IREP_TT_FLOAT | 5 | Floating-point number |
IREP_TT_BIGINT | 7 | Arbitrary-precision integer |
The code generator deduplicates pool entries: identical strings and equal numeric values share the same pool index.
Exception handler entries are appended after the instruction sequence in memory:
During exception unwinding, handlers are searched in reverse order (last to first) for the current PC.
Standard instructions use 8-bit operands. When a value exceeds 255, extension prefixes widen operands to 16 bits:
| Prefix | Effect |
|---|---|
OP_EXT1 | First operand (a) becomes 16-bit |
OP_EXT2 | Second operand (b) becomes 16-bit |
OP_EXT3 | Both a and b become 16-bit |
Instruction formats:
| Format | Layout | Size |
|---|---|---|
| Z | opcode only | 1 byte |
| B | opcode + a(8) | 2 bytes |
| BB | opcode + a(8) + b(8) | 3 bytes |
| BBB | opcode + a(8) + b(8) + c(8) | 4 bytes |
| BS | opcode + a(8) + b(16) | 4 bytes |
| BSS | opcode + a(8) + b(16) + c(16) | 6 bytes |
| S | opcode + a(16) | 3 bytes |
| W | opcode + a(24) | 4 bytes |
See opcode.md for the full instruction table.
OP_ENTER encodes a method's argument layout in a 24-bit value (W format). The bit fields are defined by the MRB_ARGS_* macros:
Example: def foo(a, b=1, *rest, &block) produces an aspec with 1 required, 1 optional, rest flag set, and block flag set.
The presym system pre-allocates symbol IDs at build time for frequently used method names and operators. This avoids runtime string interning for common symbols.
Generated by lib/mruby/presym.rb, the presym table maps symbol names to compile-time constants:
| Macro | Example | Symbol |
|---|---|---|
MRB_SYM(name) | MRB_SYM(initialize) | :initialize |
MRB_SYM_B(name) | MRB_SYM_B(map) | :map! |
MRB_SYM_Q(name) | MRB_SYM_Q(nil) | :nil? |
MRB_SYM_E(name) | MRB_SYM_E(name) | :name= |
MRB_OPSYM(op) | MRB_OPSYM(add) | :+ |
MRB_IVSYM(name) | MRB_IVSYM(name) | :@name |
MRB_CVSYM(name) | MRB_CVSYM(count) | :@@count |
MRB_GVSYM(name) | MRB_GVSYM(stdout) | :$stdout |
Precompiled bytecode is stored in the RITE binary format:
Loading functions:
mrb_load_irep(mrb, bin): load and execute from byte arraymrb_load_irep_buf(mrb, buf, len): load with explicit size (safer)mrb_read_irep(mrb, bin): load without executing (returns mrb_irep*)mrb_load_irep_file(mrb, fp): load from fileThe mrbc command-line tool performs ahead-of-time compilation:
| Limit | Value |
|---|---|
| Max nesting depth | 256 (MRB_CODEGEN_LEVEL_MAX) |
| Max local variables | 255 (uint16 nlocals) |
| Max symbols per irep | 65535 |
| Max operand (standard) | 255 (8-bit) |
| Max operand (extended) | 65535 (16-bit) |
| File | Contents |
|---|---|
mrbgems/mruby-compiler/core/parse.y | Lrama/Bison grammar |
mrbgems/mruby-compiler/core/y.tab.c | Generated parser |
mrbgems/mruby-compiler/core/codegen.c | Code generator |
mrbgems/mruby-compiler/core/node.h | AST node types |
include/mruby/irep.h | IRep structure definition |
include/mruby/compile.h | Compiler context API |
include/mruby/ops.h | Opcode definitions |
src/load.c | Binary format loader |
src/dump.c | Binary format writer |
lib/mruby/presym.rb | Presym table generator |